Convergence of Contrastive Divergence with Annealed Learning Rate in Exponential Family
نویسندگان
چکیده
In our recent paper, we showed that in exponential family, contrastive divergence (CD) with fixed learning rate will give asymptotically consistent estimates [11]. In this paper, we establish consistency and convergence rate of CD with annealed learning rate ηt. Specifically, suppose CD-m generates the sequence of parameters {θt}t≥0 using an i.i.d. data sample X1 ∼ pθ∗ of size n, then δn(X n 1 ) = lim supt→∞ ‖ ∑t s=t0 ηsθs/ ∑t s=t0 ηs − θ∗‖ converges in probability to 0 at a rate of 1/ 3 √ n. The number (m) of MCMC transitions in CD only affects the coefficient factor of convergence rate. Our proof is not a simple extension of the one in [11]. which depends critically on the fact that {θt}t≥0 is a homogeneous Markov chain conditional on the observed sample X1 . Under annealed learning rate, the homogeneous Markov property is not available and we have to develop an alternative approach based on super-martingales. Experiment results of CD on a fully-visible 2× 2 Boltzmann Machine are provided to demonstrate our theoretical results.
منابع مشابه
Learning with Blocks: Composite Likelihood and Contrastive Divergence
Composite likelihood methods provide a wide spectrum of computationally efficient techniques for statistical tasks such as parameter estimation and model selection. In this paper, we present a formal connection between the optimization of composite likelihoods and the well-known contrastive divergence algorithm. In particular, we show that composite likelihoods can be stochastically optimized b...
متن کاملInvestigating Convergence of Restricted Boltzmann Machine Learning
Restricted Boltzmann Machines are increasingly popular tools for unsupervised learning. They are very general, can cope with missing data and are used to pretrain deep learning machines. RBMs learn a generative model of the data distribution. As exact gradient ascent on the data likelihood is infeasible, typically Markov Chain Monte Carlo approximations to the gradient such as Contrastive Diver...
متن کاملLearning Rotation-Aware Features: From Invariant Priors to Equivariant Descriptors Supplemental Material
The R-FoE model of Sec. 3 of the main paper was trained on a database of 5000 natural images (50 × 50 pixels) using persistent contrastive divergence [12] (also known as stochastic maximum likelihood). Learning was done with stochastic gradient descent using mini-batches of 100 images (and model samples) for a total of 10000 (exponentially smoothed) gradient steps with an annealed learning rate...
متن کاملCystoscopy Image Classication Using Deep Convolutional Neural Networks
In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...
متن کاملThe Convergence of Contrastive Divergences
This paper analyses the Contrastive Divergence algorithm for learning statistical parameters. We relate the algorithm to the stochastic approximation literature. This enables us to specify conditions under which the algorithm is guaranteed to converge to the optimal solution (with probability 1). This includes necessary and sufficient conditions for the solution to be unbiased.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1605.06220 شماره
صفحات -
تاریخ انتشار 2016